In short: Constant mean Constant variance Constant autocorrelations
The series should also be approximately horizontal with constant variance.
True
Serially correlated observations have a few drawbacks, with the main one being issues with hypothesis testing. Serial correlation causes the estimated variances of the regression coefficients to be biased, and then leads to unreliable hypothesis testing. In essence this causes t-statistics to seem like they are more significant than they really are.
A major advantage of a serially correlated observations can indicate that a observations may not be random. This can be really useful because then the analyst can quantify these observations and reduce the error since the observations are no longer considered noisy.
The Durbin-Watson test is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals. The test is useful because autocorrelation is the similarity of a time series over successive time intervals which can lead to underestimates of the standard error and can cause predictors to report being overly significant as I discussed in question 3.
AR1<-arima.sim(list(ar=c(0.8)),10000) #AR1 is just a vector of values. They will be centered around 0 unless we add a shift or include some other source of variation like a predictor.
par(mfrow=c(1,3))
plot(1:10000,AR1,type="l")
acf(AR1,main="ACF")
pacf(AR1,main="PACF")
The simulated data exhibits the same behaviors in the ACF and PACF plots that an AR(1) does. The lag1 autocorrelation value is roughly 0.8. We can verify this based on the plots above.
#QUestion 7, Round 1
AR1<-arima.sim(list(ar=c(0.8)),50) #AR1 is just a vector of values. They will be centered around 0 unless we add a shift or include some other source of variation like a predictor.
par(mfrow=c(1,3))
plot(1:50,AR1,type="l")
acf(AR1,main="ACF")
pacf(AR1,main="PACF")
#QUestion 7, Round 2
AR1<-arima.sim(list(ar=c(0.8)),50) #AR1 is just a vector of values. They will be centered around 0 unless we add a shift or include some other source of variation like a predictor.
par(mfrow=c(1,3))
plot(1:50,AR1,type="l")
acf(AR1,main="ACF")
pacf(AR1,main="PACF")
#Question 7, Round 3
AR1<-arima.sim(list(ar=c(0.8)),50) #AR1 is just a vector of values. They will be centered around 0 unless we add a shift or include some other source of variation like a predictor.
par(mfrow=c(1,3))
plot(1:50,AR1,type="l")
acf(AR1,main="ACF")
pacf(AR1,main="PACF")
Based on the repetition of the code above, with a smaller dataset the differences in ACF and PACF are very different. The differences are due to the small sample sizes, the AR1 plot is very different and its harder to see the area where the values seem to find an average.
#Question 8
AR1<-arima.sim(list(ar=c(0.0)),50) #AR1 is just a vector of values. They will be centered around 0 unless we add a shift or include some other source of variation like a predictor.
## Warning in min(Mod(polyroot(c(1, -model$ar)))): no non-missing arguments to min;
## returning Inf
par(mfrow=c(1,3))
plot(1:50,AR1,type="l")
acf(AR1,main="ACF")
pacf(AR1,main="PACF")
Since there is no serial correlation present in the data, this code adequately represents the results from actually fitting a time series model to the data. The residuals did a good job behaving in an unexciting matter.
#AR2 Behavior
rho1<-.8
rho2<-.6
a1<-(rho1*(1-rho2)/(1-rho1^2))
a2<-(rho2-rho1^2)/(1-rho1^2)
AR2<-arima.sim(list(ar=c(a1,a2)),10000)
par(mfrow=c(1,3))
plot(1:10000,AR2,type="l")
acf(AR2)
pacf(AR2,main="PACF")
Rules of thumb for AR(2) model. Based on the plots of the ACF and the PACF, these plots meet the general rules, including the ACF plot tailing off gradually, and the PACF plot cutting off after p lags
#AR3 Behavior
a1<-1.5
a2<--1.21
a3<-.46
AR3<-arima.sim(list(ar=c(a1,a2,a3)),10000)
par(mfrow=c(1,3))
plot(1:10000,AR3,type="l")
acf(AR3,main="ACF")
pacf(AR3,main="PACF")
Rules of thumb for AR(3) model. Based on the plots of the ACF and the PACF we have a few issues, not meeting the general rules of thumb. The first plot’s ACF does not tail off gradually, and does not cut off after p lags.
#ARMA(3,2) Behavior
a1<-1.5
a2<--1.21
a3<-.46
b1<--.2
b2<--.9
ARMA32<-arima.sim(list(ar=c(a1,a2,a3),ma=c(b1,b2)),10000)
par(mfrow=c(1,3))
plot(1:10000,ARMA32,type="l")
acf(ARMA32,main="ACF")
pacf(ARMA32,main="PACF")
Rules of thumb for ARMA(3,2) model. Based on the plots of the ACF and the PACF, these plots meet the general rules, including the ACF plot tailing off gradually. The PACF plot follows the rule of thumb of trailing off gradually.
#MA(2) (Behavior)
b1<- .2
b2<- .9
MA2<-arima.sim(list(ma=c(b1,b2)),10000)
par(mfrow=c(1,3))
plot(1:10000,MA2,type="l")
acf(MA2,main="ACF")
pacf(MA2,main="PACF")
Rules of thumb for MA(2) model. Based on the plots of the ACF and the PACF, these plots follow the general rules. The ACF plot cuts off after q lags, and the PACF plot trails off gradually.
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(forecast)
library(ggplot2)
bills<-read.csv("ElectricBill.csv")
head(bills)
## Date Bill AvgTemp
## 1 5/1/2015 383.37 71.40
## 2 6/1/2015 490.36 78.60
## 3 7/1/2015 481.11 80.00
## 4 8/1/2015 517.13 86.35
## 5 9/1/2015 381.30 79.15
## 6 10/1/2015 253.40 67.95
bills$DateIndex<-1:nrow(bills)
ggplot()+geom_line(data=bills,aes(x=DateIndex,y=Bill))
attach(bills)
Acf(Bill)
Pacf(Bill)
library(car)
## Loading required package: carData
durbinWatsonTest(lm(Bill~1),max.lag=4)
## lag Autocorrelation D-W Statistic p-value
## 1 0.7380876 0.469104 0.000
## 2 0.3577343 1.143470 0.008
## 3 -0.0194295 1.795373 0.764
## 4 -0.3676480 2.400021 0.062
## Alternative hypothesis: rho[lag] != 0
AR1<-arima(Bill,order=c(1,0,0))
AR2<-arima(Bill,order=c(2,0,0))
AR3<-arima(Bill,order=c(3,0,0))
tsdisplay(residuals(AR1),lag.max=15,main="AR(1) Resid. Diagnostics")
#Fit AR(4) model
AR4 <- arima(Bill,order = c(4,0,0))
#Fit AR(5) model
AR5 <- arima(Bill,order = c(5,0,0))
#Look at residual diagnostics of all 5 models
tsdisplay(residuals(AR1),lag.max=15,main="AR(1) Resid. Diagnostics")
tsdisplay(residuals(AR2),lag.max=15,main="AR(2) Resid. Diagnostics")
tsdisplay(residuals(AR3),lag.max=15,main="AR(3) Resid. Diagnostics")
tsdisplay(residuals(AR4),lag.max=15,main="AR(4) Resid. Diagnostics")
tsdisplay(residuals(AR5),lag.max=15,main="AR(5) Resid. Diagnostics")
AIC(AR1)
## [1] 470.0568
AIC(AR2)
## [1] 464.3603
AIC(AR3)
## [1] 464.4424
AIC(AR4)
## [1] 458.2644
AIC(AR5)
## [1] 460.2588
Based on the above output, it seems that AR(4) performs the best in the metrics we are concerned with. The AIC from AR(4) is 458.2644. The residual diagnostics are starting to show the evidence of a more uncorrelated time series. The ACF are PACF are looking as we would like an uncorrelated time series.
ARIMA.fit<-auto.arima(Bill,seasonal=FALSE)
ARIMA.fit
## Series: Bill
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 1.0811 -0.4344 326.1589
## s.e. 0.1419 0.1472 31.8184
##
## sigma^2 estimated as 5528: log likelihood=-228.18
## AIC=464.36 AICc=465.5 BIC=471.12
ARIMA.fit<-auto.arima(Bill,seasonal=FALSE,stepwise=FALSE)
ARIMA.fit
## Series: Bill
## ARIMA(4,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 mean
## 0.8751 -0.3024 0.1918 -0.4628 311.7293
## s.e. 0.1415 0.2025 0.1998 0.1473 14.7687
##
## sigma^2 estimated as 4386: log likelihood=-223.13
## AIC=458.26 AICc=460.81 BIC=468.4
plot(forecast(ARIMA.fit,h=10))
points(1:length(Bill),fitted(ARIMA.fit),type="l",col="blue")
plot(forecast(AR1,h=10))
points(1:length(Bill),fitted(ARIMA.fit),type="l",col="blue")
The confidence bands on ARIMA(1,0,0) are much wider and indicate less precision with the wider bands. Additionally, the line is less specific and seems to just indicate a general linear trend.
plot(forecast(ARIMA.fit,h=100))
points(1:length(Bill),fitted(ARIMA.fit),type="l",col="blue")
I think that this forecast is reasonable given the request. The data seems to follow the same trend up until about 70 on the x axis. Then the data dissipates.
plot(AvgTemp,Bill,xlab="Avg. Temperature")
ols<-lm(Bill~AvgTemp)
abline(ols)
text(80,200,paste("Cor=",round(cor(Bill,AvgTemp),2)))
holdout.test<-window(ts(Bill),start=36)
train<-Bill[1:35]
predictor<-AvgTemp[1:35]
simpleols<-arima(train,order=c(0,0,0),xreg=predictor)
tsdisplay(residuals(simpleols),lag.max=15,main="Resid. Diagnostics of OLS")
ARIMA.with.Pred<-auto.arima(train,xreg=predictor,stepwise=FALSE)
ARIMA.with.Pred
## Series: train
## Regression with ARIMA(4,0,0) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 intercept xreg
## -0.0106 -0.7357 -0.1832 -0.7500 -235.4348 8.3784
## s.e. 0.1173 0.1139 0.1138 0.1095 16.2467 0.2481
##
## sigma^2 estimated as 1136: log likelihood=-171.98
## AIC=357.97 AICc=362.11 BIC=368.85
tsdisplay(residuals(ARIMA.with.Pred),lag.max=15,main="Resid. Diagnostics with AR(4)")
plot(forecast(ARIMA.with.Pred,h=5,xreg=matrix(AvgTemp[36:40])))
points(1:length(train),fitted(ARIMA.with.Pred),type="l",col="blue")
points(1:40,Bill,type="l")
newpred<-as.matrix(cbind(predictor,predictor^2))
colnames(newpred)<-c("Pred","Pred2")
ARIMA.with.Pred2<-auto.arima(train,xreg=newpred,stepwise=FALSE)
ARIMA.with.Pred2
## Series: train
## Regression with ARIMA(4,0,0) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 Pred Pred2
## 0.0382 -0.7339 -0.1458 -0.7136 0.6933 0.0602
## s.e. 0.1215 0.1201 0.1202 0.1153 0.2998 0.0044
##
## sigma^2 estimated as 1133: log likelihood=-171.58
## AIC=357.16 AICc=361.31 BIC=368.05
tsdisplay(residuals(ARIMA.with.Pred2),lag.max=15,main="Resid. Diagnostics AR(4) Quadratic")
test.pred<-as.matrix(cbind(AvgTemp[36:40],AvgTemp[36:40]^2))
colnames(test.pred)<-c("Pred","Pred2")
plot(forecast(ARIMA.with.Pred2,h=5,xreg=test.pred))
points(1:length(train),fitted(ARIMA.with.Pred2),type="l",col="blue")
points(1:40,Bill,type="l")
casts.avgtemp<-forecast(ARIMA.with.Pred,h=5,xreg=matrix(AvgTemp[36:40]))
accuracy(casts.avgtemp,Bill[36:40])
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.837313 30.67513 27.18185 -1.377615 10.48192 0.3929667 0.1406363
## Test set 18.592012 92.59637 89.16969 7.253808 21.57989 1.2891217 NA
cast.avgtemp.quad<-forecast(ARIMA.with.Pred2,h=5,xreg=test.pred)
accuracy(cast.avgtemp.quad,Bill[36:40])
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.717535 30.64328 26.88489 -1.975446 10.26099 0.3886736 0.1834575
## Test set 12.186182 89.18941 84.51464 5.260677 19.97953 1.2218239 NA
#View(AvgTemp)
holdout.test<-window(ts(Bill),start=36)
train<-Bill[1:35]
predictor<-AvgTemp[1:35]
simpleols<-arima(train,order=c(0,0,0),xreg=predictor)
tsdisplay(residuals(simpleols),lag.max=15,main="Resid. Diagnostics of OLS")
ARIMA.with.Pred<-auto.arima(train,xreg=predictor,stepwise=FALSE)
ARIMA.with.Pred
## Series: train
## Regression with ARIMA(4,0,0) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 intercept xreg
## -0.0106 -0.7357 -0.1832 -0.7500 -235.4348 8.3784
## s.e. 0.1173 0.1139 0.1138 0.1095 16.2467 0.2481
##
## sigma^2 estimated as 1136: log likelihood=-171.98
## AIC=357.97 AICc=362.11 BIC=368.85
tsdisplay(residuals(ARIMA.with.Pred),lag.max=15,main="Resid. Diagnostics with AR(4)")
plot(forecast(ARIMA.with.Pred,h=5,xreg=matrix(AvgTemp[36:40])))
points(1:length(train),fitted(ARIMA.with.Pred),type="l",col="blue")
points(1:40,Bill,type="l")
newpred<-as.matrix(cbind(predictor,predictor^2))
colnames(newpred)<-c("Pred","Pred2")
ARIMA.with.Pred2<-auto.arima(train,xreg=newpred,stepwise=FALSE)
ARIMA.with.Pred2
## Series: train
## Regression with ARIMA(4,0,0) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 Pred Pred2
## 0.0382 -0.7339 -0.1458 -0.7136 0.6933 0.0602
## s.e. 0.1215 0.1201 0.1202 0.1153 0.2998 0.0044
##
## sigma^2 estimated as 1133: log likelihood=-171.58
## AIC=357.16 AICc=361.31 BIC=368.05
tsdisplay(residuals(ARIMA.with.Pred2),lag.max=15,main="Resid. Diagnostics AR(4) Quadratic")
test.pred<-as.matrix(cbind(AvgTemp[36:40],AvgTemp[36:40]^2))
colnames(test.pred)<-c("Pred","Pred2")
plot(forecast(ARIMA.with.Pred2,h=5,xreg=test.pred))
points(1:length(train),fitted(ARIMA.with.Pred2),type="l",col="blue")
points(1:40,Bill,type="l")
casts.avgtemp<-forecast(ARIMA.with.Pred,h=5,xreg=matrix(AvgTemp[36:40]))
accuracy(casts.avgtemp,Bill[36:40])
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.837313 30.67513 27.18185 -1.377615 10.48192 0.3929667 0.1406363
## Test set 18.592012 92.59637 89.16969 7.253808 21.57989 1.2891217 NA
cast.avgtemp.quad<-forecast(ARIMA.with.Pred2,h=5,xreg=test.pred)
accuracy(cast.avgtemp.quad,Bill[36:40])
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.717535 30.64328 26.88489 -1.975446 10.26099 0.3886736 0.1834575
## Test set 12.186182 89.18941 84.51464 5.260677 19.97953 1.2218239 NA